# HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation


## Installation

```shell
conda create -n HermesFlow python==3.8.10
conda activate HermesFlow
pip install -r requirements.txt
```

## Curate Homologous Preference Date

For understanding preference data:

```shell
python3 inference_mmu_caption.py config=configs/hermesflow_demo_512x512.yaml
```

For generation preference data, first you should generate images according to input prompts:

```shell
python3 inference_t2i.py config=configs/showo_demo_512x512.yaml batch_size=1 guidance_scale=5 generation_timesteps=50 mode='t2i'
```

Then, use MLLM itself to conduct VQA to perform VQA evaluation on these images:

```shell
python3 inference_mmu_vqa.py config=configs/hermesflow_demo_512x512.yaml
```

Finally, get the homologous preference data for Pair-DPO using:

```shell
python3 datasets/journeydb/get_dpo_data.py
```

## Pair-DPO Training

Use Pair-DPO to optimized base-MLLM through:

```shell
accelerate launch --config_file accelerate_configs/1_gpu.yaml --main_process_port=8888 training/train_pairdpo.py config=configs/hermesflow_pairdpo.yaml
```

## Iterative Optimization

First you should follow the same step before to curate understanding and preference data. Then using this script to update the homologous preference data:

```shell
python3 datasets/journeydb/get_dpo_data_iterative.py
```

Finally, use the same script to optimize MLLM through Pair-DPO